1 Packages and the Data

Required packages and the data are as follows:

library(ggplot2)
library(dplyr)
library(tidyverse)
library(data.table)
library(plotly)
library(RColorBrewer)
library(FactoMineR)
library(hrbrthemes)
library(RColorBrewer)

train_x<-fread( file= "data/UWave-20201126T225338Z-001/UWave/uWaveGestureLibrary_X_TRAIN", sep=" ")
train_y<-fread( file= "data/UWave-20201126T225338Z-001/UWave/uWaveGestureLibrary_Y_TRAIN", sep=" ")
train_z<-fread( file= "data/UWave-20201126T225338Z-001/UWave/uWaveGestureLibrary_Z_TRAIN", sep=" ")

2 Data Manuplation

Raw data is prepared for the analysis and the long format is obtained to be used in some parts.

train_x_df<-as.data.frame(train_x)
train_y_df<-as.data.frame(train_y)
train_z_df<-as.data.frame(train_z)

## formatting x data

names(train_x_df)[1]<-"class"
names(train_x_df)[2:316]<-1:315

train_x_df<-train_x_df%>%
  mutate(time_series_id=1:896)%>%
  mutate(class=factor(ifelse(class==1, "a",
                      ifelse(class==2, "b",
                             ifelse(class==3, "c",
                                    ifelse(class==4, "d",
                                           ifelse(class==5,"e",
                                                  ifelse(class==6,"f",
                                                         ifelse(class==7, "g",
                                                                "h"))))))),
                      levels=c("a","b","c","d","e","f","g","h")))
                                                                      
x_raw<-train_x_df[-c(1,317)] %>% setNames(paste0('X.', names(.)))
y_raw<-train_y_df[-c(1,317)] %>% setNames(paste0('Y.', names(.)))
z_raw<-train_z_df[-c(1,317)] %>% setNames(paste0('Z.', names(.)))
mds_data<-cbind(train_x_df[c(317,1)],x_raw,y_raw,z_raw)

x_formatted<-pivot_longer(
  train_x_df,
  cols=-c(time_series_id,class),
  names_to = "time_index",
  values_to = "X",
)

x_formatted<-x_formatted%>%transform(time_index=as.integer(time_index))%>%relocate(class, .after=X)%>%arrange(time_series_id,class)

## formatting y data
train_y_df<-as.data.frame(train_y)

names(train_y_df)[1]<-"class"
names(train_y_df)[2:316]<-1:315

train_y_df<-train_y_df%>%
  mutate(time_series_id=1:896)%>%
  mutate(class=factor(ifelse(class==1, "a",
                             ifelse(class==2, "b",
                                    ifelse(class==3, "c",
                                           ifelse(class==4, "d",
                                                  ifelse(class==5,"e",
                                                         ifelse(class==6,"f",
                                                                ifelse(class==7, "g",
                                                                       "h"))))))),
                      levels=c("a","b","c","d","e","f","g","h")))

y_formatted<-pivot_longer(
  train_y_df,
  cols=-c(time_series_id,class),
  names_to = "time_index",
  values_to = "Y",
)

y_formatted<-y_formatted%>%transform(time_index=as.integer(time_index))%>%relocate(class, .after=Y)%>%
  arrange(time_series_id,class)

## formatting z data

names(train_z_df)[1]<-"class"
names(train_z_df)[2:316]<-1:315

train_z_df<-train_z_df%>%
  mutate(time_series_id=1:896)%>%
  mutate(class=factor(ifelse(class==1, "a",
                             ifelse(class==2, "b",
                                    ifelse(class==3, "c",
                                           ifelse(class==4, "d",
                                                  ifelse(class==5,"e",
                                                         ifelse(class==6,"f",
                                                                ifelse(class==7, "g",
                                                                       "h"))))))),
                      levels=c("a","b","c","d","e","f","g","h")))


z_formatted<-pivot_longer(
  train_z_df,
  cols=-c(time_series_id,class),
  names_to = "time_index",
  values_to = "Z",
)

z_formatted<-z_formatted%>%transform(time_index=as.integer(time_index))%>%
  relocate(class, .after=Z)%>%
  arrange(time_series_id,class)

combined<-cbind(x_formatted,y_formatted,z_formatted)
combined<-combined[c(1,2,3,7,11,12)]

ts_Position<-combined %>% 
  group_by(time_series_id) %>% 
  mutate(X_velocity = cumsum(X),
         Y_velocity = cumsum(Y),
         Z_velocity = cumsum(Z),
         X_position = cumsum(X_velocity),
         Y_position = cumsum(Y_velocity),
         Z_position = cumsum(Z_velocity))

3 PART A: Data Visualization

Firstly, the acceleration values are summed up cumulatively to get the velocity values. These velocity values obtained for three axes are plotted for 8 gesture types. The colors in the graphics change over time. The movement that starts with purple dots ends with a yellow point, and the colors between these colors are scaled again depending on the time. the yellow dot here can be thought as the end of the arrow. It is seen that the starting and ending points overlap by looking at all the plots showing the velocity values. However, this is not the case with the gesture types defined. For this reason, position values need to be calculated in order to associate these gesture types with data. The velocity values are added up cumulatively again to obtain the position values. In the last case, position values are obtained for all axes and when these values are plotted, results similar to the defined gesture types are obtained. One thing I pay attention to while establishing this relationship is that the defined types are 2-dimensional, but the position plots I get are 3-dimensional. Since there is such a difference, I made an assumption and thought as follows: If the 2-dimensional route defined can be seen from any angle on the 3-dimensional position plots, they are compatible. When this logic is applied for 8 gestures, it can be seen that the resulting position plots fit the defined gesture types. For a clearer view, plots can be zoomed in and out. Also, the resulting shape can be rotated, axis values can be viewed by clicking on it.

ts_id<-list(list(11,"a"),list(15,"b"),list(4,"c"),list(5,"d"),list(2,"e"),list(1,"f"),list(7,"g"),list(6,"h"))


  data<-ts_Position%>%filter(time_series_id==ts_id[[1]][[1]],class==ts_id[[1]][[2]])

  plot1<-plot_ly(data,
          x=~X_velocity, y=~Y_velocity, z=~Z_velocity,color =~ceiling(time_index/100),
          type="scatter3d", mode="markers") %>% 
    layout(title = paste("Velocity Chart for a Gesture from Class",ts_id[[1]][[2]]))
  
  plot1
  plot2<-plot_ly(data,
          x=~X_position, y=~Y_position, z=~Z_position,
          color =~ceiling(time_index/100),
          type="scatter3d", mode="markers")%>% 
    layout(title = paste("Position Chart for a Gesture from Class",ts_id[[1]][[2]]))
  
  plot2

4 PART B: Principal Component Analysis and Visualization of the Results

Since the position data is found in the previous part compatible, the operations are performed using the position values in this and the following parts. Using the position data of the x, y and z axes of all classes, PCA is performed and eigen vectors are obtained. As a result of PCA, it would be proper to take the first two components that give approximately 90 percent cumulative proportion, as it is desired to maintain the variance in the data by obtaining a high variance as possible. However, since the data is wanted to be reduced from 3 dimensional to 1 dimensional, only the first component is taken.

In the first chart, it is shown how much the variance obtained expresses the variance of the data.

In the second chart, the shares of the x, y and z axes that constitute the first two principal components are visualized.

##PART B
###PCA for all series combined

cor(ts_Position[,c(10,11,12)])
##            X_position Y_position Z_position
## X_position  1.0000000  0.2732331 -0.1033031
## Y_position  0.2732331  1.0000000  0.5935095
## Z_position -0.1033031  0.5935095  1.0000000
pca_all <- princomp(as.matrix(ts_Position[,c(10,11,12)]),cor=T)

summary(pca_all,loadings=TRUE)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3
## Standard deviation     1.2724995 1.0380930 0.5505524
## Proportion of Variance 0.5397517 0.3592124 0.1010360
## Cumulative Proportion  0.5397517 0.8989640 1.0000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.209  0.908  0.362
## Y_position  0.723  0.105 -0.683
## Z_position  0.658 -0.404  0.635
PCA_vis<-PCA(ts_Position[,c(10,11,12)])

PCA_vis$eig
##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.6192550               53.97517                          53.97517
## comp 2  1.0776371               35.92124                          89.89640
## comp 3  0.3031079               10.10360                         100.00000

Since it is reduced to a single variable, each instance can be plotted as time series. It can be concretized how well the component defines the data and classes by selecting 2 random instances from each class and checking their compatibility.

ts_Position_uni<-cbind(ts_Position,PC1=pca_all$scores[,1])


classes<-c("a","b","c","d","e","f","g","h")

for(i in 1:8){
  PCA_all_ts_a<-ts_Position_uni%>%filter(class==classes[i])%>%select(time_series_id,time_index,PC1,class)
  random_ids<-sample (unique(PCA_all_ts_a$time_series_id), size=2, replace =F)
  PCA_all_ts_a<-PCA_all_ts_a%>%
    filter(time_series_id %in% random_ids)%>%
    pivot_wider(.,names_from = time_series_id , values_from = PC1 ,names_prefix = "id")

  plot<-ggplot(PCA_all_ts_a,aes_string(x=names(PCA_all_ts_a)[1]))+
    geom_line(aes_string(y=names(PCA_all_ts_a)[3]),size=2,color="#CC6666")+
    geom_line(aes_string(y=names(PCA_all_ts_a)[4]),size=2,color="#9999CC")+
    theme_ipsum()+
    xlab("Time")+
    ylab("PC1")+
    ggtitle(paste("Univariate Time Series from Class",classes[i]))
  print(plot)
}

By looking at the results, time series are quite compatible in e and f classes. However, when the general evaluation is made, although the time series are compatible as patterns, it is seen that a gap occurs between them. Although it gives good results in e and f classes, it is a random situation to give good results because the selected instances are random.

We can say briefly that the PCA explains class patterns well.

5 PART C: PCA Applied on each Gesture Separately

When PCA is applied separately for each class, the first principal component differs between classes. The reason for this is that the weight of the movement on the axes changes in every gesture type. For example, when the a class, the first gesture type, is evaluated, the movement mostly occurs on the y and z axes as it can be seen from the visualization made in Part A. Accordingly, the coefficients of y and z in the first principal component are high for the first gesture.

In short, the first principal component has different axis coefficients in different gesture types because it depends on the weight of the movement in the related axes.

for(i in 1:8){
  
  print(paste("PCA for the Class",classes[i]))
  ts_Position_class<-ts_Position%>%filter(class==classes[i])
  cor(ts_Position_class[,c(10,11,12)])
  
  pca_class <- princomp(as.matrix(ts_Position_class[,c(10,11,12)]),cor=T)

  print(summary(pca_class,loadings=TRUE))
  
  PCA_vis_class<-PCA(ts_Position_class[,c(10,11,12)])
  print(PCA_vis_class$eig)

}
## [1] "PCA for the Class a"
## Importance of components:
##                           Comp.1    Comp.2    Comp.3
## Standard deviation     1.2932492 1.0091473 0.5559931
## Proportion of Variance 0.5574978 0.3394594 0.1030428
## Cumulative Proportion  0.5574978 0.8969572 1.0000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.119  0.973  0.198
## Y_position  0.693 -0.224  0.685
## Z_position  0.711        -0.701

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.6724935               55.74978                          55.74978
## comp 2  1.0183782               33.94594                          89.69572
## comp 3  0.3091283               10.30428                         100.00000
## [1] "PCA for the Class b"
## Importance of components:
##                           Comp.1    Comp.2    Comp.3
## Standard deviation     1.1286537 0.9457021 0.9120243
## Proportion of Variance 0.4246197 0.2981175 0.2772628
## Cumulative Proportion  0.4246197 0.7227372 1.0000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.535  0.814  0.224
## Y_position  0.613 -0.192 -0.767
## Z_position  0.581 -0.548  0.602

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.2738591               42.46197                          42.46197
## comp 2  0.8943525               29.81175                          72.27372
## comp 3  0.8317884               27.72628                         100.00000
## [1] "PCA for the Class c"
## Importance of components:
##                           Comp.1    Comp.2    Comp.3
## Standard deviation     1.2156489 1.0027730 0.7187794
## Proportion of Variance 0.4926008 0.3351846 0.1722146
## Cumulative Proportion  0.4926008 0.8277854 1.0000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.709         0.705
## Y_position  0.371  0.847 -0.380
## Z_position -0.599  0.532  0.599

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.4778023               49.26008                          49.26008
## comp 2  1.0055538               33.51846                          82.77854
## comp 3  0.5166439               17.22146                         100.00000
## [1] "PCA for the Class d"
## Importance of components:
##                           Comp.1    Comp.2    Comp.3
## Standard deviation     1.2855781 0.9914895 0.6035210
## Proportion of Variance 0.5509037 0.3276838 0.1214125
## Cumulative Proportion  0.5509037 0.8785875 1.0000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.702         0.711
## Y_position  0.674  0.285 -0.681
## Z_position -0.230  0.958  0.173

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.6527111               55.09037                          55.09037
## comp 2  0.9830514               32.76838                          87.85875
## comp 3  0.3642376               12.14125                         100.00000
## [1] "PCA for the Class e"
## Importance of components:
##                           Comp.1    Comp.2     Comp.3
## Standard deviation     1.4038307 0.9685920 0.30180949
## Proportion of Variance 0.6569135 0.3127235 0.03036299
## Cumulative Proportion  0.6569135 0.9696370 1.00000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.245  0.970       
## Y_position  0.685 -0.177  0.707
## Z_position  0.686 -0.170 -0.708

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1 1.97074054              65.691351                          65.69135
## comp 2 0.93817049              31.272350                          96.96370
## comp 3 0.09108897               3.036299                         100.00000
## [1] "PCA for the Class f"
## Importance of components:
##                           Comp.1    Comp.2     Comp.3
## Standard deviation     1.3400570 0.9638120 0.52470332
## Proportion of Variance 0.5985843 0.3096445 0.09177119
## Cumulative Proportion  0.5985843 0.9082288 1.00000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.291  0.955       
## Y_position -0.671  0.251 -0.698
## Z_position -0.682  0.162  0.713

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.7957528              59.858428                          59.85843
## comp 2  0.9289336              30.964453                          90.82288
## comp 3  0.2753136               9.177119                         100.00000
## [1] "PCA for the Class g"
## Importance of components:
##                           Comp.1    Comp.2    Comp.3
## Standard deviation     1.2971922 0.8883869 0.7266781
## Proportion of Variance 0.5609026 0.2630771 0.1760203
## Cumulative Proportion  0.5609026 0.8239797 1.0000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.596  0.487  0.639
## Y_position  0.632  0.207 -0.747
## Z_position  0.496 -0.849  0.184

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.6827077               56.09026                          56.09026
## comp 2  0.7892313               26.30771                          82.39797
## comp 3  0.5280610               17.60203                         100.00000
## [1] "PCA for the Class h"
## Importance of components:
##                           Comp.1    Comp.2    Comp.3
## Standard deviation     1.2203138 0.9683089 0.7571078
## Proportion of Variance 0.4963886 0.3125407 0.1910707
## Cumulative Proportion  0.4963886 0.8089293 1.0000000
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## X_position  0.664  0.253  0.704
## Y_position  0.337 -0.941       
## Z_position  0.668  0.223 -0.710

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.4891657               49.63886                          49.63886
## comp 2  0.9376221               31.25407                          80.89293
## comp 3  0.5732122               19.10707                         100.00000

6 PART D: Multidimensional Scaling and Visualization of the Results

First, the distance between time series is calculated using the Euclidean method and the symmetric distance matrix is obtained. Then, MDS is applied to this matrix and all time series are made two-dimensional. In other words, time series are expressed with 2 variables instead of multiple time indexes. In order to see the positions of the time series with respect to each other, they are plotted according to their binary values and colored with the gesture types.

dist_matrix<-dist(mds_data[,-c(1,2)],method = "euclidean")
MDS<-cmdscale(dist_matrix)
colnames(MDS) <- c("x","y")
MDS<-data.frame(MDS)
MDS<-cbind(mds_data[c(1,2)],MDS)
mds_plot<-ggplot(MDS,aes(x=x,y=y))+
  geom_jitter(aes(color=class),size=3)+
  scale_colour_brewer(palette ="Accent" )+
  theme_minimal()+
  labs(color="Gesture Class")

mds_plot

mds_plot+facet_wrap(~class)

For MDS to be successful, the same gesture types must be close to each other. According to plots above obtained as a result of MDS, this method gives very successful results for the gesture types a, b and g, while less successful results for the c, d and f groups in a wider area. However, time series belonging to the same class are not grouped in different regions, they display a collective image. For this reason, MDS becomes successful in making this classification.